Document limitation of parsing body fragments only, rename processFragment method#4
Merged
Document limitation of parsing body fragments only, rename processFragment method#4
body fragments only, rename processFragment method#4Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The HTML5 specification defines different parser states and tokenization modes depending on where content appears in an HTML document. For example, content inside
<textarea>or<title>tags is parsed differently than content in the<body>. The PHP DOM API for HTML5 parsing doesn't currently expose a documented way to create fragments with the necessary context information to handle all these different parsing states correctly.So, this PR renames the
processFragment()method toprocessBodyFragment()to make the limitation explicit in the API itself. Also, the fragment parsing implementation is changed from a container-element based approach to using the body element directly, to make use of the "in body" parsing state. A cautionary notice is added in the README.The previous implementation could produce incorrect results for fragments from
<head>sections (like<title>tags) or other contexts where different parsing rules apply, or for fragments that contain<head>and/or<body>tags themselves.Skipped tests demonstrate the issue, hopefully making it easier to fix properly when PHP's DOM API adds support for fragment context.
For now, explicitly limiting usage to the
<body>seems to be a safer choice and easier to change in the future than having potentially quirky detection logic in the implementation that would try to handle different cases (fromheadand/orbody) correctly.